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A 0.9- V, 150-MHz, 10-mW, 4 mm^ 2-D 
Discrete Cosine Transform Core Processor with 
Variable Threshold-Voltage (VT) Scheme 

Tadahiro Kuroda, Member, IEEE, Tetsuya Fujita, Shinji Mita. Tetsu Naganiatsu, Shinlchi Yoshioka, Kojiro Suzuki, 
Fumihiko Sano, Masayuki Norishima. Masayuki Murota, Makoto Kako, Masaaki Kinugawa, Member, lEEE^ 
Masakazu Kakumu, Member, IEEE, and Takayasu Sakurai, Member, IEEE 



A&stmct — A 4 nixn^» two>dlmenslonal (Z-D) 8x8 dlKcrelc 
COS4130 tracmform (DCT) coro processor for HDTV-resoIution 
video eomprcsisioii/decotnpresslon la a 0.3-/tin CMOS tdplb-weU, 
dottble-juct^ tvchnolpj^y operBtes at 150 MHz from a 0.9-Y power 
supply And consum^ 10 tnW, only 2% pover dissipation of o 
previous 3.3-V design. Circuit- techniques for dynBtnlcaHy varying 
threshold voltage (VT scheme) are Introduced to reduce active 
power dissipation with ticglit^blB ovorhead in upeed, standby 
power dissipation, and chip area. A way to explore Vdd - Vts 
de.dgn space is al50 studied. 



h iNTRODUCnON 

LOwnRING both the supply voltage Vj^o and threshold 
voltage Vt/t enables high-speed, low-power operation [1], 
[2]* Tliis approach, however, twbcs two problems [3]. [4], I) 
degradation of worKt-cat>c speed due to Vth fluctuation In low 
VoDf and 2) incrcusc in standby power dissipation in low-VJ/^. 
To solve Uiese problems, .several Kchcmcs ans proposed. A 
aelf-adjusting threshold voltage (SAT) scheme [5] irduccs Vth 
nucLtiblian in an uctivc oiode by adjusting substrate bias with 
a. feedback control circuit. A etandby power ceduccton {SFR) 
scheme [6J raises Vth in a litandby mode by switching substrate 
bias between the power supply and an external additional 
supply higher than Vdd or lower than GND. A multi threshold 
voltage CMOS (MT-CMOS) scheme {7] employ low VJ/^ 
for fast circuit operation and high VtJi for provldiag and 
cutting internal supply voltage. The SAT and Ihc SPR are botli 
ba.<tcd upon the same Idea that Vth controlled dynamically 
through subKiraLe bias. However, the two scheme^ cannot l>e 
conrxbined because the SPR rcquirea the external supply for 
the substrate bias while the SAT generaten the substrate bias 
internally. The MT-CMOS does not solve the first problem. 
. It requires very iHTge transistors for die internal power supply 
control to impoac area and yield penalties, otherwise dcfjmding 
circuit speed. Funhcnnore, It cannot be applied to memory 
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elements without circuit tricks which add another area and 
ttpccd pcnaliics. 

This paper presents a variable threshold voltage scheme (VT 
scheme) which can solve chose two probjems uniformly in a 
unified way by controlling substrate bias with substrate bias 
feedback control circuits. Unlike the conventional approaches, 
it requires no external power supply for the substrate bias, 
leaves no restricdon In use, imposes practically no penalty 
in speed and chip area« and can be applied to both logic 
gates and memory elements. The VT scheme is employed 
in a two-dimensional (2-D) 8x8 discrete cosine transform 
(DCT) core processor for portable HDTV-resolutlon video 
compression/decompression. This DCT in a CMOS 
technology operates at 150 MHz from a 0.9-V power supply 
and consumes 10 mW. only 2% power dissipation of a 
previous 3.3-V design [8]. 

In Section U. low Vj^p, low VJ/^ design space Is explored 
to inve.«iitgate Vth target. In Section in, the VT scheme is 
pjrosented. followed by descriptions of circuit implementations 
in Section IV, Section V details the design of the DCT. 
Expcnmcntal results appear In Section VI. Section VII Is 
dedicated for conclusions. 

11. Expi-bRiNG ijy^'VpD Low-Vth DEsicN Space 
CMOS power dissipation Is given by . 

i' = ^ • Pt • fcr^K • Ct. ' y|i, + ^ • 10-tV'^/5> . Vpj, (1) 

where is the Hwjiching probability, /cxif *s the clock 
frequency, Is the load capacitance, S Is the subthreshold 
swing, and /o is a contitant which is proponional to total 
transistor width in a chip. The finjt term represents dynamic 
power dissipation due to choiring and diuchurging of the load 
capacitance, and the second tertn is leakage current dissipation 
due to sub^reshold conduction. Since the dominant term in 
a typical CMOS design is the dynamic power dissipation, 
lowering VpD is effective to low-powcr design. 

Gate propagation delay, on the other hand, is approximately 
given in [9] by 



tr>d = 



k ^ Cl * Vdp 



(2) 
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Fig. I. Exploring low-Vjj;?. low-Vi/, design *pacB. Contour Uncs in tenn> 
of speed (b«)kiii lines) and power (wild lines) «c dniwiL 
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Fig. 2. Variflblc ihmshold-voluge Q/T) scheiDc. 

too ^fcA from the substrate to lower Vbb "aing a -^0 MHx. ring 
OficiUator- This cuncni is large enough for Vbb to settle down 
within 10 /isi after a power-on. When V^b goes lower than 
^ftctt^(+)F pump driving frequency drops to 5 MHz and 
the SSB draws 10 fiA to control Vbb more precisely. The 
SSB Slops when Vbb drops below Victlve. however, 
rises gradually due to device leakage current through MOS 
transistors and junctions^ and reaches X^ccivo to activate the 
SSB again. In dais way, Vbb controlled at VL-Uvh by the 
on-ofF control of the SSB.' Wlien Vj^b goes deeper than 
V„^j,^(_), the SCI luras on to inject 30 a\A into the «ubstrdtc. 
Therefore, even if Vbi? jumps beyond Vlctivc(-i-) or V^^tivcC-) 
due to a power line bump for example, Vbb is quickly 
recovered to VlctivA ty the SSB and the SCL When "ST.F.FP* 
signal is asserted ("I") to go to the standby mode, the SCI is 
disabled and the SSB is activated again and 100 /lA current is 



drawn from the substrate until Vbb leaches VitKndby Vbb is 
controlled at Ktiu^dby in the same way by the on-off control 
of the SSB. When ''SLEEP' signal becomes "0" to go back to 
the active mode, the SSB is disabled and the SCI is activated 
The SCI injects 30 mA currant into the substrate until Vbb 
reochcB V^ctiwC-). Vbb is finally set at V^twc- In this way, 
the SSB is mainly used for a transition from the active mode 
to the standby mode, while the SCI is used for a U-ansition 
from the standby to the active mode. An active to standby 
mode transition takes about ICO ptsi, while a standby to accivc 
mode tremsition is completed in OJ /is. This "slow falling 
a-sleep but fast awakening" feature is acceptable for most of 
the applications. 

The SSB operates intermittently to compensate for the 
voltage Huctuation in the subfltratc due to the substrate current 
in the active and the i>tandby modos. It therefore consumes 
several microamperes in the active mode and less than one 
nanoampere in the standby mode, both much lower dian 
die chip power dissipation. Energy required lo chotgc and 
diKcharge the substrate for switching between die active and 
the standby modcK is less than 10 nJ. Even when the mode 
is switched 1000 (hues in a second, die power dissipation 
becomes only 10 |iW. The leakage cuirent monitor tihould 
be designed to dissipate less than 1 nA becauno it always 
works even in the standby mode. The low-power circuit design 
technique iji described in the next sccdon. 

IV. cxRcurr Implemektations 

A Leakage Current Monitor (LCM) 

The substrate bias is generated by the SSB which is con- 
trolled by the leakage current monitor (LCM), The LCM is 
therefore a key to the accurate control in the VT scheme. Fig. 5 
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Fig. 7. Pump circuit In SSB, 




Compared with the coiivenilonal LCM where Vt is gener- 
ated by dividing the Vpp-G^D voltage with high impedance 
resisuirR, the Vth control labiiicy including the static and dy- 
namic effects is improved from ±0,05 V to less than ±0.01 V, 
response delay is shortened from 0,6 to 0.1 ^s, and Si area is 
reduced from 33 250 to 670 fjim^. This layout area reduction 
is brought by the elimination of the high impedance resistors 
by polysilicon, 

B. Self-Substrate Bias Circuit (SSB) 

Pig. 7 depicts a schematie diagrani of a pump circuit in 
the SSB. PMOS transistors of the diode coniigumtion Brc 
connected in K^crics whose intermediate nodes are driven by 
two signals, *l and ^2, in 180** phase shift. Every other 
cransistor, therefore, sends current alternately from p-well to 
GND, resulting in lower p-well bias than GNP. The SSB can 
pump as low as —4.5 V. SSB circuits are widely used in 
DRAM'S and B^FROM's, but two orders of magnitude smaller 
circuit can be used in the VT schenoe. The driving current of 
the SSB is 100 ^A, while it is usually several milliampcres in 
DRAM*5, 'Ihis is because substrate ciutent generation due to 
the impact jonizacioA is a strong function of the supply voltage. 
Substrate current in a 0.9-V DCT is considerably smaller than 
that in a 3.3-V design- Substrate current introduced from I/O 
pads does not affect the DCT macro because it is separated 
from pedpheral circuits by a tripIc-wcU structure. Eventually, 
no substrate current is generated in the standby mode. From 
these reasons* the pumping current in ilie SSB can be as small 
as several percent of that in bRAM*s. Silicon area is also 
reduced considerably. Another concern about the SSB is an 
initialization time after a power-on. Even in a 10 mm square 
chip. VsB wctdcH down within 200 ps. after a power-on» which 
is acceptable in real use*. 

C. Substrate Charge Tr\jector (SCI) 

In the VT scheme, care should be lokcA so that no tran- 
sistor sees high-voltage stress of gate oxide and juncdons. 
Transistors are optimized for use at '3.3 V. The gate oxide 
thickness is 8 nm. The maximum voltage that assures sufficient 
reliability of the gate oxide is V^d + 20%, or 4 V. The SCI in 
Hg. 8 receives a control signal that swings between Voo and 
GND ai node Ni to drive substrate from VJtandby to 
In the stondby-to-ociive transidon, V^o + |^t«ncu^y| that i$ 
about 6.6 V at maximum can be applied between N\ and N2, 
However, as shown in SPICE simulated waveforms in Fig. 8. 
IK7.9I and \yoD\ of Ml and M2 never exceeds the larger of 
Vp2) and |Vitifcndby|' All other transistors in the VT circuit and 
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Fig. 8, $*C1 40<1 it* waveforms simulalcd by SPICE. 
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Fig. 9* DCT block diagram, 

the DCT macro, receive (Vd/> — VJ/^) on dieir gate oxide when 
the channel is foimcd in the depletion and the inversion node, 
and less than |V«andhy| ^ the accumulation mode. These 
considerations lead to a general guideline that Kcondbx j^hould 
be limited to -(Vbzj 4- 20%). Of -{Vov + 20%). 

however, can shift Vth big enough to reduce the Icaicagc 
current in the standby mode. The body effect coefficient, 7, 
can be adjusted indepchdendy to Vth by controlling the doping 
concentration density in the channel-subatrate depletion layer. 

V. DCT Design 

A. Circuit Design 

This DCT core processor executes 2-D 8x8 DCT and in- 
verse DCT. A block diagram is illustrated in Fig. 9. The DCT 
is composed of two one-dimensional (1-D) DCT and inverse 
DCT processing units and a tmnaposition RAM. Rounding 
circuits and clipping circuits which prevent overflow and 
underflow are also implemented in the cell. The DCT has 
a concurrent architecture based on distributed arithmetic and 
a fast DCl' algorithm, which enables high throughput DCT 
processing of one pixel per clock. It also has fUlly pipelined 
structure. The 64 input data sampled in every clock cycles arc 
outpuctcd after 112 clock cycle latency. 

Various memories which use the same low Vth, transistors as 
logic gales are employed in the DCT. T^bla lookup ROM's (16 
b X 32 words x 1 6 banks) employ contact programming and 
an inverter-type sense-amplifier. Single-port SBAM*s (16 b 
X 64 words X 2 banks) and dual-port SRAM's (16 b X 8 
words X 2 banks) employ a six-transistor cell and a loich 
sense-amplifier. They all exhibit wide operational margin in 
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Fig. 11. DCT layoiu nitxIiQcatiun fcir ihe V7 schema: (a) devlco 
crex:(»-.<tccrion, (b) p-well (one tsliind). and (c) n^weJl (pjacei^ of islcmdi) 
in deep n-wcll. 

even whftn 1 00 kf2 resistance is added between the substrate 
and the output of the SSB. 

VII. Conclusions 

A 4 mm* 2-D DCT core processor for portable multime- 
dia equipment with HDTV-resolution video compression and 
decompression has been developed in a 0.3 -/xm CMOS» triple- 
well, double-metal technology. It operates at 150 MHz from 
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Pig. 12. Chip micn>^niph; (o) DCT macro Dnd (b) VT clrcuitj, 

a 0.9 V power supply and dissipates 10 mW, which ts only 
2% of the previous 3.3 V design. Circuit design techniques 
for dynamically varying threshold voltage (VT scheme) arc 
introduced to reduce active power dissipation with negligible 
overhead in speed, standby power dissipadcn, and chip area. 
The active-to-standby rtiode transition taJces 120 fjs, while the 
standby-to-active mode transition is completed within 0.2 /is. 
The VT scheme can be applied to both lo^ic gates and memory 
elements. Generation of the low- voltage V^^ on chip is a 
future research work. 
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